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^There  has  recently  been  much  Interest  in  the  development  of  soft»«ire  valida- 
tion tools  for  FORTRAN.  Such  tools  are  usually  designed  to  analyze  proprams 
written  in  ANSI  standard  FORTRAN.  However,  because  there  are  many  dialects 
and  extensions  of  FORTRAN  in  use,  it  would  be  desirable  to  analyze  these  as 
well.  One  solution  is  to  develop  a sinple  diapnostic  tool  for  standard  FORTRAN 
which  may  be  easily  modified  to  accept  variants  of  the  lanpuape.  Since  most  of 
the  variations  occur  at  lexical  and  syntactic  levels,  the  design  of  a flexible 
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Abstract 

There  has  recently  been  much  interest  in  the  development  of  software 
validation  tools  for  FORTRAN.  Such  tools  are  usually  designed  to  analyze 
programs  written  In  ANSI  standard  FORTRAN.  However,  because  there  are 
many  dialects  and  extensions  of  FORTRAN  In  use.  It  would  be  desirable  to 
analyze  these  as  well.  One  solution  Is  to  develop  a single  diagnostic 
tool  for  standard  FORTRAN  which  may  be  easily  modified  to  accept  variants 
of  the  language.  Since  most  of  the  variations  occur  at  lexical  and 
syntactic  levels, the  design  of  a flexible  lexical  analyzer  Is  a key 
Issue.  The  FSCAN  Lexical  Analyzer  Generating  System  has  been  designed 
with  this  purpose  In  mind.  This  report  describes  the  FSCAN  language,  a 
compiler  for  the  language,  and  an  Interpreter  for  the  resulting  object 
code.  An  example  of  a complete  FSCAN  program  Is  Included. 
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I.  INTRODUCTION 

The  first  phase  of  the  analysis  of  a computer  program  written  In 
some  prograitmlng  language  Is  "lexical  analysis"  or  "scanning",  where 
the  source  text  Is  broken  up  Into  the  words  or  "tokens"  of  the  programm- 
ing language.  For  most  languages  this  Is  a relatively  straightforward 
task,  as  spaces  or  some  other  delimiter  Is  required  at  any  token  separa- 
tion points  that  could  be  ambiguous.  Unfortunately  the  ANSI  FORTRAN  stand- 
ard specifies  that  spaces  for  the  most  part  are  meaningless  In  FORTRAN 
programs  [1].  This  creates  several  ambiguous  situations  that  cannot  be 
resolved  without  backtracking  by  a left  to  right  scan  with  single  char- 
acter lookahead  of  the  source  text.  For  example.  If  the  string  'DO'  has 
been  read.  It  Is  unclear  whether  the  scan  has  reached  the  end  of  the 
keyword,  'DO',  In  a statement  such  as 

00  IP  I - 1,  3 

or  whether  the  scan  is  in  the  middle  of  a variable  name  In  a statement 
such  as 

001  » 5 * X 

The  problem  of  the  lexical  analysis  of  FORTRAN  Is  further  complicated 
by  the  existence  of  numerous  dialects  and  extensions  of  FORTRAN  that  vary 
according  to  the  Installation  and  particular  compiler  In  use.  The  pro- 
blem Is  therefore  most  acute  for  a system  such  as  the  DAVE  software  val- 
idation system  [2]  where  It  Is  desirable  that  aj^  variants  of  FORTRAN 
be  readable.  Ordinarily  this  would  entail  recoding  the  lexical  analyzer 
module  for  each  new  FORTRAN  variant.  In  addition  to  maintaining  a library 
of  already  coded  lexical  analyzer  modules. 

To  minimize  these  tasks,  the  FSCAN  (Fortran  SCANner)  Lexical  Ana- 
lyzer Generating  System  was  developed.  The  FSCAN  system  consists  of  a 
language,  a compiler  for  the  language,  and  an  Interpreter  for  the  object 
code  produced  by  the  FSCAN  compiler. 

TI.  THE  LANGUAGE 

The  FSCAN  lanouaae  (henceforth  referred  to  simolv  as  "FSCAN")  was 
desloned  to  allow  the  soeclflcatlon  of  a comolex  lexical  analvzer.  such 
as  that  reaulred  bv  FORTRAN,  In  as  concise  and  understandable  a manner  as 
possible. 
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A.  Programs  and  Procedures 

An  FSCAN  program  consists  of  a single  FSCAN  procedure  (within  which 
may  be  defined  additional  procedures).  An  FSCAN  procedure  specifies  In 
an  extended  BNF-style  notation  a grammar  that  describes  a left  to  right 
pass  over  the  source  text.  Within  the  grammar,  actions  such  as  the  gen- 
eration of  a token  are  Indicated. 

Syntax 

An  FSCAN  procedure  consists  of  a sequence  of  grammatical  rules  which 
are  delimited  by  the  keywords,  'SCANNER'  and  'END'.  Following  each  of 
these  keywords  Is  the  goal  symbol  for  the  sequence  of  rules;  this  also 
serves  as  the  name  of  the  procedure.  The  redundant  repetition  of  the 
goal  symbol  Is  used  by  the  FSCAN  compiler  to  ensure  that  the  'SCANNER'  - 
'END'  pairs  are  matched  In  the  way  the  programmer  Intended. 

Example 

SCANNER  DIG; 

rule  1;  rule  2;  ...;  rule  n; 

END  DIG 
Semantics 

The  rule  Indicated  by  the  goal  symbol  of  a procedure  specifies  an 
LR(1)  parse  of  the  source  text  which  Is  performed  when  the  procedure  Is 
called.  The  parse  is  performed  in  a longest  match  manner;  namely,  given 
the  choice  between  finishing  and  parsing  more  of  the  course  text,  the 
procedure  will  always  continue  parsing. 

B.  Rules 

An  FSCAN  rule  1s  either  a macro  rule,  a variable  defining  rule,  or 
a procedure  rule.  The  scope  of  rule  definitions  corresponds  to  that  of 
ALGOL. 

1.  Macro  Rules 

As  In  a BNF  rule,  the  left  side  of  a macro  rule  Is  a nonterminal 
while  the  right  side  Is  a sequence  of  alternatives.  The  extensions  of 
FSCAN  are  that  each  alternative  may  optionally  have  an  associated  action, 
and  that  an  alternative,  rather  than  being  simply  a sequence  of  terminals 
and  nonterminals,  may  contain  any  of  a variety  of  regular  expression 
style  operators  as  well  as  parentheses  for  grouping. 
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Syntax 

Each  alternative  is  preceded  by  a single-right-arrow  ( >).  The 
optional  action  is  placed  at  the  end  of  the  corresponding  alternative 
and  is  preceded  by  a double-right-arrow  (=^). 

Example 

TEXT  ->  fscan_reg_exprn  1 action  1 

-*■  fscan_reg_exprn  2 
->  fscan_reg_exprn  3 action  2 

Semantics 

A macro  rule  is  a standard  macro  in  that  the  right  part  of  the  rule 
textual ly  replaces  any  occurrence  of  the  nonterminal  of  the  left  part, 
when  the  occurrence  is  in  an  FSCAN  regular  expression  within  the  scope 
of  the  macro  rule  definition.  A macro  rule  cannot  be  recursively  de- 
fined. Thus  in  the  above  example,  the  nonterminal,  TEXT,  could  not  appear 
in  any  of  the  three  FSCAN  regular  expressions  in  the  right  part.  During 
execution  when  any  of  the  alternatives  have  successfully  been  matched 
with  the  source  text,  the  corresponding  action,  if  any,  is  performed. 

The  compiler  ensures  at  compile  time  that  during  execution  of  the  object 
code  it  is  determinable  which  action,  if  any,  is  to  be  performed  by  ex- 
amining the  next  source  text  character  only. 

2.  Variable  Defining  Rules 

A variable  defining  rule  is  similar  in  form  to  a macro  rule  except 
that  the  right  side  is  restricted  to  being  a single  alternative.  The 
nonterminal  on  the  left  side  names  the  variable  being  defined,  in 
addition  to  naming  the  regular  expression  on  the  right  side,  as  in  a 
macro  rule. 

Syntax 

The  single  alternative  is  preceded  by  an  equal  sign  (■). 

Example 

HCONST  ■ fscan_reg_exprn 
Semantics 

A variable  is  used  to  convey  numeric  information  from  the  source 
text  to  the  FSCAN  program.  Its  semantics  correspond  to  those  of  a macro 
rule  except  that  an  implicit  "evaluation-action"  is  attached  to  the 
single  alternative  of  the  right  part.  When  executed  this  action  evaluates 
the  string  processed  by  the  right  side  of  the  variable  defining  rule. 
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The  number  produced  is  stored  as  the  value  of  the  variable  defined  by 
that  rule.  The  variable  can  then  be  used  in  FSCAN  contexts  where  in- 
tegers are  expected,  in  which  case  no  macro  substitution  occurs,  but 
rather,  during  execution  the  integer  value  is  that  produced  by  the  most 
recent  execution  of  that  variable's  execution  action.  The  compiler  en- 
sures that  it  is  always  possible  to  derive  an  integer  from  strings 
matched  by  the  right  part  of  a variable  defining  rule. 

3.  Procedure  Rule 

A procedure  rule  is  simply  an  FSCAN  procedure,  see  II.  A. 

C.  FSCAN  Regular  Expressions  (abbreviation  : FRE) 

1.  Atomic  units 

The  atomic  units  of  an  FRE  are  terminals,  nonterminals,  and  integers. 

a.  Terminals 
Syntax 

A terminal  is  either  a "kept-string"  or  a "deleted  string".  A kept- 
string  is  a sequence  of  characters  enclosed  in  double  quotes  (")  while 
a deleted  string  is  a sequence  of  characters  enclosed  in  single  quotes 
(').  If  a sharp  (#)  appears  in  the  string,  the  sharp  is  ignored  and 
the  next  character  is  treated  as  the  next  character  of  the  string,  even 
if  that  character  is  a double-quote,  single-quote,  or  a sharp.  For 
terminals  the  strings  are  restricted  to  be  of  length  one. 

Examples 

'A'  "i"  MjjlMII 

Semantics 

The  character  of  the  terminal  is  compared  with  the  next  character 
of  the  source  text.  If  they  match,  the  source  text  character  is  marked 
as  "kept"  or  "deleted",  depending  on  whether  the  terminal  is  a kept-string 
or  a deleted-string.  The  FSCAN  compiler  will  indicate  * if  it  is  ever 
possible  for  a given  FSCAN  program  to  mark  a source  text  character  simul- 
taneously as  "kept"  and  "deleted".  *(with  an  appropriate  error  message 
at  compile  time) 

b.  Integers 
Syntax 

An  integer  is  a string  of  digits. 

Examples 

53  0 05 
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Semantics 

Integers  have  their  usual  meaning. 

c.  Nonterminals 
Syntax 

A nonterminal  is  a sequence  of  letters  and  digits,  the  first  of 
which  is  a letter,  that  is  terminated  by  a character  that  is  neither  a 
letter  nor  a digit. 

Examples 

A TEMP  TEMPI  B3B 
Semantics 

Nonterminals  can  name  macro  rules,  variables,  or  procedure  rules. 

As  mentioned  earlier,  macro  rule  names  are  textual ly  replaced  by  the 
right  part  of  the  macro  defining  rule,  for  which  the  semantics  have  been 
described.  The  semantics  of  variable  names  vary  according  to  their  con- 
text. If  a variable  is  used  where  an  integer  is  expected,  the  current 
value  of  the  variable  is  used  during  execution;  otherwise,  the  right  part 
of  the  variable  definition  (with  implicit  associated  "evaluation  action") 
textual ly  replaces  the  use  of  the  variable  name.  When  the  non  terminal 
names  a procedure,  the  appropriate  procedure  is  called  during  execution. 
The  compiler  ensures  at  compile  time  that  at  any  point  in  execution,  it 
is  determinable  from  the  character  presently  being  examined,  whether  to 
invoke  a procedure,  and  which  one  to  invoke. 

2.  Operations 

The  operations  from  which  FSCAN  regular  expressions  are  composed 
can  be  divi.ded  into  two  types;  basic  operations,  and  extended  operations 
that  can  be  defined  in  terms  of  the  basic  operations.  Let  A,  B,  C be 
FRE's  and  let  N be  a variable  or  integer. 


a.  Basic  Operations 
Syntax 


Alternation 

: A 1 B 

Concatenation 

; ABC 

Repetition 

A* 

Negation 

NOT  A 

Example 

NOT 

"?")  Cx'*) 
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Semantics 

An  alternation  successfully  matches  the  source  text  if  any  of  its 
alternates  does.  A concatenation  matches  the  source  text  if  its  operands 
sequentially  match  the  source  text.  A repetition  matches  an  arbitrary 
number  (possibly  2ero)  of  its  operand  with  the  source  text.  The  operand 
of  a negation  is  restricted  to  regular  expressions  that  specify  a set 
of  characters,  all  of  which  are  kept-strings  or  all  of  which  are  deleted 
strings.  A negation  then  matches  any  character  that  is  not  in  its 
operand's  character  set.  If  matched,  a source  character  is  marked  as 
"kept"  or  "deleted"  if  the  operand  character  set  consists  of  kept-strings 
or  deleted-strings,  respectively. 

b.  Extended  Operations 
Syntax 


+ 

A 

+ 

3 

A (A*) 

? 

A 

■> 

s 

A 10 

LIST 

A 

LIST  B 

s 

A (B  A)* 

ELSE 

A 

ELSE  B ELSE  C ELSE 

• • • s 

A 1 B I C 1 ... 

** 

A 

**  N 

8 

A A A ...  A (N  times) 

?♦ 

A 

?*  N 

3 

A?  A?  A?  ..  A?  (N  times) 

Restrictions:  The  operands  of  ELSE  and  the  first  operands  of  **  and 
?*  are  restricted  to  being  the  names  of  procedures. 

Semantics 

The  semantics  of  the  extended  operations  are  largely  determined  by 
those  of  the  basic  operations  by  which  they  are  defined.  In  addition, 
though, the  ELSE  construct  provides  a "backup  and  restore"  feature 
where  if  the  first  operand  fails  to  successfully  match  the  source  text, 
the  second  operand  is  tried,  etc.  Also  the  ?*  operator  provides  limited 
backup  in  the  sense  that,  if  less  than  N A's  have  been  successfully 
matched,  the  parse  is  backed  up  to  the  state  at  which  the  last  A 
(possibly  no  A's)  has  been  successfully  matched. 

c.  Actions 
Syntax 

Actions  are  either  kept-strings,  deleted-strings,  integers,  or 
nonterminals. 

Examples 

"I NT"  'REAL'  8 203  CARDS  RESCAN 


I 

j 

i 

I 

I 


3 
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Semantics 

A string  or  an  integer  indicates  that  a token  is  to  be  output.  For 
a string,  the  type  of  the  token  output  is  indicated  by  a unique  integer 
associated  at  compile  time  with  that  string;  for  an  integer,  the  type  of 
token  is  indicated  during  execution  by  outputting  the  value  (e.g.,  ”8" 
or  "203")  of  the  integer  action.  Also  output  during  execution  is  the 
sequence  of  kept  characters  that  were  matched  by  the  alternative  corre- 
sponding to  that  action  being  performed.  Actions  that  are  deleted- 
strings  indicate  that  their  corresponding  alternatives  only  mark  char- 
acters as  deleted,  and  thus  it  is  sufficient  to  simply  generate  the 
token  type  when  the  action  is  performed. 

(Note:  A program  cannot  contain  both  integer  and  string  actions.)  A 
nonterminal  action  indicates  that  the  sequence  of  kept  characters  match- 
ed by  that  action's  alternative  is  to  be  rescanned  by  the  FSCAN  procedure 
named  by  the  nonterminal.  This  process  of  rescanning  is  sometimes 
referred  to  as  "screening". 


III.  EXAMPLE  OF  A COMPLETE  FSCAN  PROGRAM 

This  FSCAN  program  specifies  the  scanner  used  by  the  FSCAN  com 
pller,  I.e.,  It  performs  the  lexical  analysis  of  an  FSCAN  program. 


SCJOliax  PSCAN  : 

PSCAN  ->  (•  ••  (KBYWORD  BLSB  NANB  / IRTBCBR  / BSTRIHG  / DSTRIMG  / 

OBLIMITBR  / OPBRATOR  / CONHBNT)  )•  j 
SCAMMBR  RBTNORD  : 

RBYMORO  >>  RBYND  NOTACHAR**0  ; 

BBYUD  ->  'S*  ’C  'A*  'N'  'B*  'R*  ->  8 

->  *8'  ’ll*  'O’  ■>  7 
->  ’B*  'L*  'S*  *8*  ->  9 
->  ‘L*  ’I*  'S*  ’T*  ->  14 
->  *0*  'T*  •>  19  I BMO  BBYWORD  » 

SCAIRBR  RANB  t 

RAMB  ->  KACBAR  (RACHAR  / IDIGIT)  • » 20  > BMD  NAHB  ; 

ITfTBQBR  ->  K0ICXT4-  IIOn**0  ->  21  ; 

KSTRZMG  ->  OQ  (NOTOQSH  / SHARP  KC) • DQ  ->  22  | 

OSTRING  >>  SO  (NOTSQSB  / SHARP  RC) * SQ  ->  23  : 

DHLINITBR  •>  ' $ ' ->  2 
->  'j'  ->  3 
->  •(»  ->  5 
->•)••>  6 > 

OPERATOR  ->  •>•  ->  4 

->  V ->  10 

->  HOTRAB**0  ->  11 

->  •>•  .>  12 

->  •?•  NOTAST**0  ->  13 

->  ->  15 

->  •?•  ->  16 

->  •••  HOTAST**0  ->  17 

->  ->  18  ; 

CONNBHT  ->  SHARP  (NOT  SHARP)*  SHARP  } 

RACHAR  ->  *A*/*B*/"C"/"D*/*B*/*P*/*G*/*H*/*t*/*J*/*R*/"L*/*N*/ 
•N*/*OV"P*/"0’/*R'/"S*/-TVO*/*V*/-»I*/*X-/-Y*/*2*  ; 

SCANNBX  NOTACHAR  s 

NOTACBAR  ->  NOT  ( ‘ AV'B*/*C*/*D’/*BV*P*/'G'/'H'/ * 1 J V'R*/* L'/*N' / 
•mV’OVPVQ'/’RVSVTVuV'v'/'RV*R'/'yv*’)  : end  notacrar  ? 
RDIGIT  ->  •0*/*l*/*2*/'3"/*4*/*5*/*6*/*7*/*t*/*9*  i 
SCANNBR  NOTO  : 

NOTD  ->  NOT  RDIGIT  > BND  NOTD  ; 

OQ  ->  ; SO  ->  f SHARP  ->  ; 

NOTDQSB  ->  NOT("##*/*#**)  j NOTSQSH  ->  NOT(*##*/*’*)  ; 

SCANNBR  NOTRAB  : 

NOTRAB  ->  NOT  ’ > ' ; BND  NOTRAB  j 
SCANNBR  NOTAST  : 

NOTAST  ->  NOT  '•'  > BND  NOTAST  j 
RC  ->  (NOT*  •)/■  ■ I 
END  PSCAN 


INTERPRETATION 


LINE  NUMBER 


The  top  level  procedure*  and  therefore  the  program, 
is  called  FSCAN. 

The  scanner  accepts  a sequence  of  KEYWORD'S,  NAME'S, 
INTEGER'S,  etc.,  each  of  which  can  be  preceded  by 
an  arbitrary  number  of  spaces.  As  KEYWORD'S  and 
NAME'S  cannot  be  differentiated  by  an  SLR  (1)  process 
the  ELSE  operation  must  be  used  to  allow  the  accept- 
ance of  either.  Note  that  KEYWORD'S  being  the  first 


INTERPRETATION 


LINE  NUMBER 
2 

(cont'd) 


3 


4 


5 


operand  of  ELSE  will  cause  a string  that  could  be 
accepted  as  either  a KEYWORD  or  a NAME,  to  be  accept- 
ed as  a KEYWORD. 

A KEYWORD  Is  a KEYWD  followed  by  some  nonalphabet 1c 
(NACHAR)  character.  Note  that  the  exponent  of  zero 
Indicates  that  although  it  Is  checked  that  the  follow- 
ing character  Is  nonalphabetic,  no  (zero)  nonalph- 
betlc  characters  are  actually  processed  at  this  point. 
In  case  the  following  character  were  alphabetic,  the 
KEYWORD  scanner  would  fall,  and  the  alternative,  NAME, 
would  be  Invoked  at  the  point  In  the  Input  where  the 
KEYWORD  scanner  had  been  Initiated. 

KEYWD  will  accept,  and  mark  as  deleted,  the  strings 
"SCANNER:,  "END",  "ELSE",  "LIST",  and  "NOT",  and  will 
output  tokens  numbered  8,  7,  9,  14,  and  19,  respec- 
tively. 

NAME  will  accept,  and  mark  as  kept,  an  alphabetic 
character  followed  by  an  arbitrary  number  of  alpha- 
numeric characters.  Token  number  20  will  then  be  out- 
put, as  well  as  the  sequence  of  kept  characters  mark- 
ed by  NAME. 


The  rest  of  the  program  Is  Interpreted  In  an  analogous  fashion.  It  thus 
provides  a rigorous  and  complete  specification  of  the  lexical  analysis  of 


FSCAN  programs. 

The  abbreviated  non-terminals  are  to 

KSTRIN6 

kept  string 

OSTRING 

deleted  string 

NOTACHAR 

not  an  alphabetic  character 

KACHAR 

kept  alphabetic  character 

KOIGIT 

kept  digit 

NOTD 

not  a digit 

DQ 

double  quote 

NOTDQSH 

not  a double  quote  or  a sharp 

KC 

kept  character 

SQ 

single  quote 
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NOTSQSH  : not  a single  quote  or  a sharp 

NOTRAB  : not  a right  angle  bracket 

NOTAST  : not  an  asterisk 

With  this  Information,  the  sample  FSCAN  program  also  provides  a structur- 
ed and  understandable  description  for  a human  reader. 

IV.  PROGRAMMING  HINTS  FOR  FSCAN 

Much  of  FSCAN  programming  Is  similar  to  writing  a grammar  for  some 
parser  generator.  The  regular  expression-style  operators  are,  for  the 
most  part,  straightforward  extensions.  The  distinction,  though,  between 
a procedure  and  a macro-rule,  i.e.,  "SCANNER  A ; A -►  B C,  END  A"  vs. 

"A  ■*  B C"  does  not  correspond  to  any  grammatical  concepts,  but  rather  to 
the  normal  programming  language  concepts  of  a procedure  and  a macro. 

In  particular,  a macro  (when  used  In  more  than  one  place)  causes  a 
larger  object  program  to  be  generated  (as  a copy  of  the  macro  Is  Insert- 
ed at  each  use  of  the  macro)  while  a procedure  executes  more  slowly 
(due  to  the  overhead  of  the  procedure  call  and  return).  An  additional 
distinction  that  Is  Important  for  programming  Is  that  while  only  one 
procedure  can  be  executing  at  any  particular  time,  several  macho  rules 
can  conceptually  be  executing  In  parallel. 

The  "ELSE"  operator  Involves  considerably  more  overhead  than  the 
"1"  operator  In  that  the  state  of  the  scanner  must  be  saved  so  that  It 
can  be  restored  In  case  a particular  alternative  of  the  "ELSE"  operator 
fa-ils.  Implying  the  next  alternative  must  be  tried.  In  contrast,  the 
"I"  operator  conceptually  applies  all  of  Its  alternatives  In  parallel. 
Thus  whenever  possible,  the  "|"  operator  should  be  used  for  the  sake 
of  efficiency. 

The  "**"  and  "?*"  operators  are  conceptually  straightforward,  except 
possibly  for  the  following  two  characteristics.  First,  "A**P"  Indicates 
that  the  next  character  In  the  Input  Is  checked  for  a match  with  a 
legal  first  character  of  A,  but  that  A does  not  actually  process  any 
characters,  due  to  the  exponent  of  p.  Second,  the  "?*"  operator  In- 
volves the  same  overhead  as  the  "ELSE"  operator,  since  "A  ?*  5"  must 
have  the  ability  to  back  up  to  the  state  of  the  scanner  after  the  third 
A was  accepted,  In  case  the  entire  fourth  A could  not  be  matched. 


V.  IMPLEMENTATION  DETAILS 
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Compller.  The  complete  FSCAN  compiler  runs  In  36,000  (decimal) 
words  on  a CDC  6400  machine.  The  compile  time  for  an  FSCAN  program 
for  ANSI  FORTRAN  is  28  seconds.  The  size  of  the  object  code  (tables) 
produced  for  this  scanner  is  1400  (decimal)  words. 

The  compiler  is  written  in  machine-independent  standard  ANSI 
FORTRAN,  with  the  following  exceptions: 

1.  Certain  non-standard  functions  are  assumed: 

a.  lAND  (A,B),  lOR  (A,B),  INOT  (A) 

These  should  return  the  respective  bitwise  logical  opera- 
tion on  their  arguments 

b.  LRS  (A, I),  LLS  (A, I) 

These  should  return  the  logical  binary  right  and  left 
shift,  respectively,  of  the  argument  A by  the  integer 
amount  I,  with  zero  fill. 

c.  INTGER  (A) 

The  argument  A is  a character  stored  in  IH  or  A1  for- 
mat (assumed  equivalent).  The  result  is  an  Integer 
such  that: 

1.  0 < INTGER  (A)  s # distinct  characters 

2.  INTGER  (A)  « INTGER  (B)  iff  A is  the  same  char,  as  B 

3.  INTGER  (IHx)  - INTGER  (IHy)  » x-y  if  x,  y are  digits, 
i.e.,  INTGER  (1H7)  - INTGER  (1H3)  » 4 

d.  ENDFIL  (I) 

This  returns  true  iff  logical  unit  I is  at  end  of  file. 


2.  It  is  assumed  that  the  if  characters  s 2 •(#  bits  in  a word). 


I 
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If  this  is  not  the  case,  the  bit  vector  module  would  have  to 
be  altered  to  represent  a vector  as  more  than  2 words. 

Note:  This  machine  dependency  is  being  replaced  by  the  re- 
quirement that  on  importation  to  a new  machine,  the  constants 
NMBIT'S  and  NMCHRS  be  initialized  to  correspond  to  the  new 
machine.  For  CDC,  the  initialization  is 
DATA  NMB ITS/60/.  NMCHRS/64/ 

Iriterpreter.  The  object  code  for  the  interpreter  consists  of  3000 
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words  excluding  the  space  required  for  the  tables,  The  Interpreter 
source  code  is  machine-independent  standard  ANSI  FORTRAN,  with  the 
same  exceptions  found  for  the  compiler. 

The  tables  output  by  the  compiler  are  in  the  form  of  a BLOCK  DATA 
ANSI  FORTRAN  subprogram  which  is  to  be  compiled  and  loaded  with  the 
object  code  for  the  interpreter. 

Note:  For  maximal  time  efficiency,  the  routines  IN  and  ADVANC  should 
be  replaced  by  equivalent  optimized  machine-language  routines. 

The  source  for  the  compiler  is  on  file  COMPIL. 

The  compiler  is  called  by;  CB,  input,  listing,  errors,  tables. 

The  source  for  the  interpreter  is  on  file  NTD. 

The  Interpreter  is  called  by:  NTDB,  input,  listing,  errors. 

The  FSCAN  program  for  ANSI  FORTRAN  is  on  the  file  SCAN. 

The  file  SCANT  was  produced  by  a CB.  SCAN,  listing,  errors.  SCANT  run. 
(NTDB  is  produced  from  the  compilation  of  SCANT  and  NTD) 

To  use  the  scanner,  NTDB, 

for  each  desired  token,  call  SCANNER; 

the  token  will  be  returned  in  /TOKENC/, 

where  TKNTYP  is  the  type  of  the  token 

TKNCHR  (30)  is  an  array  of  A1  characters  (the  sub-rosa  info) 
where  TKNCHR  (1)  ...  TKNCHR  (ITKNCH)  are  the  characters 
TOKERR  is  a logical  flag  which  is  true  iff  the  token 
contains  an  error 
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Appendix  , 

I 

I 

» 

I 

Syntax  of  FSCAN  programs 
SCANNER 

- 'SCANNER'  G0AL_SYMB0L  ':'  (RULE  ';')  + 'END'  GOAL_SYMBOL;  | 

RULE 

NONTERMINAL  ('->'  REG_EXPRN( '=»  ' ACTION)?)+ 

VARIABLE  ' = ' REG_EXPRN 
SCANNER  ; 

REG_EXPN  REG_TERM  Ijst  ' | ' , 

REGJERM  REG_FACTOR  + ; 

I 

REG_FACTOR  ' 

- REG_PRIMARY  ('*'!'+' I'?')?  : 

'NOT'  REG_PRIMARY 

REG_PRIMARY  'LIST'  REG_PRIMARY; 

REG_PRIMARY 

- '('  REG_EXPRN  ? ’)'  I 

NONTERMINAL  list  'ELSE' 

- NONTERMINAL  (' ** ' | ' ?* ' ) EXPONENT 
> TERMINAL  ; 

ACTION  ^ SCREENER  | TERMINAL  | '<INTEGER>'  ; 

EXPONENT  ^ VARIABLE  | '<INTEGER>'  ; 

GOAL_SYMBOL  -»>  '<NAME>'  ; 

NONTERMINAL  '<NAME>'  ; 

VARIABLE  - '<NAME>'  i 
SCREENER  ■*>  '<NAME>'  ; 

TERMINAL  > •<KEPT_STRING> ' | '<DELETED_STRING>'  ; 

i 

5 

i 

Note:  "A?"  is  equivalent  to  "(A|e)" 

"A  list  B"  is  equivalent  to  A(B  A)* 


T 
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